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TARGET DN A. 

The present invention relates to collections of labelled target DNA. 

Recent years have seen a grov\^ in the realisation of the importance of gene expression in 
the control of biological activities. It is knovm that expression of specific subsets of 
genes regulate tissue formation and organogenesis during development and also the 
properties of adult tissues. Patterns of gene expression influence not only the structure 
and composition of specific tissues, but also the tissues' responses to various stimuli. 
These structures, composition and responses, and the patterns of gene expression 
encoding them, are distinctive markers for individual tissues. 

At a more complex level the pattem of genes expressed by whole organisms may be 
characteristic of specific individuals and provide an insig^ht into their biological status. 
For instance, there is growing evidence that the pattem of genes expressed by an 
individual may influence factors such as the individual's predisposition to particular 
diseases or their responsiveness to certain therapeutic agents. 

The current challenge to biologists is to learn how the products of tiie aroimd 40,000 
identified human genes int^act to produce the complexity exhibited by higher eukaryotes. 
To a large extent the biological character of a cell can be inferred firom the profile of 
genes it expresses. Although an examination of mRNA or protein expression patterns 
alone does not directiy address fimction, the knowledge of when and where a gene is 
expressed can provide valuable insights as to the potential role of a gene and has 
historically been instrumental in the discovery of developmentally regulated genes. 
Recognition of the value of the examination of expression patterns led to the development 
of a plethora of advanced mRNA profiling technologies such as cDNA microarrays 
(Duggan et al., 1999), SAGE (Velculescu et al., 1995), and cDNA display (Liang and 
Pardee, 1992) aimed at the simultaneous measurement of tens to several thousand genes 
in the target samples. Application of these profiling technologies to clinical diseases, such 
as cancer has confirmed the utility of profiling and provided usefiil diagnostic and 
prognostic assays (Shipp et al., 2002; Staunton et al., 2001; van *t Veer et al., 2002). 
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Despite the success of these approaches at the molecular level by identifying patterns of 
expression exhibited generally by relatively homogeneous cellular samples the cellular 
complexity of higher eukaryotes still presents a major obstacle to expression profiling. 

Over the last 30 years a variety of molecular techniques have been developed for the 
analysis of gene-expression. In general methods focussed either on the identification and 
characterisation of genes (either individual genes or networks of related genes) or the 
characterisation of the input tissue or cell based on a characteristic profile of expressed 
genes. Although conventional nucleic acid hybridization techniques (such as northern and 
dot blots) have been used for many years to analyse a small nxraiber of genes and samples 
there have been a variety of advanced mRNA profiling technologies such as cDNA 
microarrays (Duggan et al., 1999), SAGE (Velculescu et al., 1995), and cDNA display 
(Liang and Pardee, 1992) which have been recently developed to allow the sunultaneous 
measurement of tens to several thousand genes in the target samples. 

Many of tiiie techniques for analysis of gene-expression described above require the use of 
labelled target DNA capable of binding to complementary DNA sequences in reference 
samples. In order to take both full advantage of and to extend recent improvements in 
gene-expression analysis it is important that the labelled target DNA be sensitive, that is 
to say having a high binding affinity for complementary DNA sequences. It is also 
beneficial to be able to produce labelled target DNA from small samples, ideally single 
cells, since this allows a greater range of cell types to be used (since it obviates a 
requirement for large nimibers of cells), and improves confidence that the starting 
population is '"pure", rather than representing a mbced population of cell types such as is 
found in many tissue samples. Furthermore, it is advantageous if labelled target DNA can 
be produced rapidly, by cheap simple techniques. Unfortunately many known collections 
of labelled target DNA suffer from disadvantages in that they have relatively low 
sensitivity, or are prepared by laborious, complicated or expensive techniques. 

It is an object of the present invention to obviate or mitigate the disadvantages associated 
with the prior art. 



According to the present invention there is provided a collection of labelled target DNA 
molecules which are exonnclease derivatives of cDNA, 

Collections of labelled target DNA molecules according to the invention provide a 
number of advantages over prior art target DNA collections, as set out below. Briefly, 
target DNA collections of the invention provide advantages in terms of their enhanced 
sensitivity, their ability to be prepared from small samples, and their ease and cost of 
preparation* 

Collections of labelled target DNA molecules according to the invention have greater 
sensitivity than previously described targets since the single-stranded target DNA 
molecules of the invention are not susceptible to "self-hybridisation". Thus collections of 
labelled target DNA according to the invention, \^en used in a hybridisation-based assay, 
are more readily able to hybridise with complementary DNA sequences in a reference 
sample, should such sequences be present. Furthermore, preparation of flie target 
molecules is more flexible, cheaper and simpler than prior art techniques. These 
advantages arise from the fact that the collection of target molecules can be prepared from 
small amounts of starting material (thereby avoiding costly purification steps and 
increasing the variety of samples from which labelled target DNA can be prepared), and 
can be prepared using cheap, simple techniques. Furthermore, since labelled target DNA 
of the invention can be prepared frqm samples as small as a single cell, it is possible to 
ensure that the starting population from which the target DNA is prepared represents a 
pure population as opposed to a mixture of different cell types. 

The collection of labelled target DNA molecules may be prepared by a number of 
different methods. The methods described below are simple allowing easy, cost-effective 
preparation of the collections of labelled target DNA. 

One method suitable for the preparation of a collection of labelled target DNA molecules 
according to the invention is to subject cDNA or a derivative thereof (e.g. a DNA 
population produced by total or partial amplification of the cDNA population) to 



exonuclease digestion such lhat a collection of essentially single-stranded DNA molecules 
is produced, and then to label these single-stranded molecules. 

Conveniently the single-stranded molecules may be labelled by incorporation of labelled 
nucleotides at the 3' end of die single-stranded DNA molecules using the template- 
independent DNA polymerase terminal transferase. 

An alternative method by which collections of the invention may be prepared is to treat 
double-stranded cDNA or a derivative thereof to obtain a labelled double-stranded DNA 
population and then to effect exonuclease digestion of the labelled DNA population. 
Production of labelled double-stranded DNA from the double-stranded cDNA population 
(or derivative) may, for example, be effected by addition of labelled nucleotides via the 
DNA polymerase terminal transferase (as described above). Alternatively a labelled 
double-stranded DNA population may be derived from the cDNA or derivative through 
amplification of the original cDNA by well-known polymerase chain reaction (PGR) 
techniques using labelled nucleotides. The labelled double-stranded DNA population 
may then be subjected to exonuclease digest in order to produce a substantially single- 
stranded labelled DNA population. In cases in which the label is incorporated using PGR 
this provides an advantage in lhat the efficiency of label incorporation can be readily 
assessed by gel electrophoresis and/or real-time quantitative PGR. 

Convenientiy labelled double-stranded DNA representative of gene expression in a 
sample of interest may be prepared using primers comprising a homopolymer T tract (for 
example GATCTCGAGCGGCCGCTTTTTTTTTTTTTT^ An example 

of this amplification technique is described in Brady et al. (1990). When combined with 
homopolymer tailing (for example using terminal transferase) PGR using such primers 
produces a population of DNA molecules, in which all molecules have a poly-T region at 
one end and a poly-A region at the other. This technique has the advantage that a single 
oligonucleotide can be used for die initial and all subsequent PGR amplifications. The 
technique also obviates tiie need to create new priming sites within the molecules to be 
amplified, since each molecule produced by amplification contains a poly-A region that 
can anneal to a poly-T region in the primer aUowmg further rounds of amplification. 
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Primers comprising a poly-T tail (as described above) may also comprise a further 
sequence of nucleotides in addition to the tail region. Such further nucleotides may be 
selected to allow the incorporation into DNA molecules, produced by PGR using these 
primers, of regions that may be advantageous for the further amplification or subsequent 
use of molecules produced. For example primers may be designed such that they will 
incorporate "anchor^* sequences (thereby enabling improved specificity of subsequent 
* PGR) or cloning sites (allowing subsequent manipulation of amplified DNA products). 
Suitable sequences for incorporation into such primers to achieve these purposes would 
be immediately appreciated by one skilled in the art. 

It will be appreciated that the methods described above may be effected by the use of a kit 
according to appropriate instructions. Accordingly there is provided a kit for the 
preparation of a collection of labelled target DNA molecules according to the invention, 
the kit comprising: 

(i) an exonuclease; 

(ii) terminal transferase; and 

(iii) labelled nucleotides. 

There is also provided a kit for the preparation of a collection of labelled target DNA 
molecules according to the invention, the kit comprising: 

(i) an exonuclease; 

(ii) primers; and 

(iii) labelled nucleotides. 

A collection of target DNA molecules according to the invention may be labelled by 
incorporation of labelled nucleotides within tiie DNA molecules. Labelled nucleotides 
may incorporate a detectable moiety, or may contain a functional group (e.g. an amino 
group) that is subsequentiy able to react with a detectable moiety. Suitable detectable 
moieties include fluorescent moieties (flurophores), radio*labelled moieties, and enzymes 
capable of producing a chromogenic reaction with a suitable substrate. In a preferred 
embodiment DNA molecules according to the invention are direcfly labelled by 
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incorporation of nucleotides labelled with fluorescent moieties. This technique provides 
the advantage that relatively small quantities of fluorescent label are required. This has 
obvious benefits in terms of reducing tiie cost associated with the production of labelled 
target DNA. Suitable examples of commercially available fluorescently labelled 
nucleotides include FluoroLink nucleotides, which are supplied by Amersham Pharmacia 
Biotech. 

In a preferred embodiment of the invention the cDNA from which the labelled single- 
stranded DNA population is derived, is globally amplified cDNA. By globally amplified 
cDNA we mean cDNA in which DNA molecules representing gene expression retain die 
same relative abundance as the mRN A transcripts from which they are derived. 

There are a number of known techniques by which globally amplified cDNA suitable for 
use in the invention may be produced. Most preferably the global amplified cDNA is 
prepared from mRNA using limiting concentrations of nucleotides and a relatively short 
incubation time in order to limit cDNA synthesis. This ensures that, no matter what the 
length of the original mKNA transcript, all cDNA molecules produced are of 
approximately the same relatively small size. Since all the cDNA molecules are of 
approximately equal size subsequent amplification of the cDNA results in equal 
reproduction of all the cDNA molecules present. This ensures that the amplified cDNA 
produced reflects the original relative abundance of the mRNA present in the biological 
sample. Suitable protocols for the production of global amplified cDNA of this nature 
are provided in Brady et al 1990, Cumano et al. 1992 and Brady et al 1993. In addition 
to the advantage of allowing the production of amplified populations of cDNA that 
niqintflTP the relative abundance of the original mRNA the use of global amplified cDNA 
also provides other advantages. For example global amplified cDNA can be derived 
either dkectly from one or more freshly isolated living cells without the need for RNA 
isolation, or fix)m mRNA purified from a biological sample. Additionally, tihie production 
of global cDNA is well suited to automation, providing advantages in terms of ease and 
speed of use. 



The use of globally amplified cDNA in the production of collections of labelled target 
DNA according to the invention provides a number of advantages. A first advantage 
arises fiom the fact that globally amplified cDNA can be produced fix)m samples as small 
as a single cell, which may typically contain in the region of 20pg total RNA. Since 
conventional techniques for the production of collections of target DNA typically require 
starting quantities of RNA in the region of 20 \ig the ability to work with single cells 
represents a million fold increase template sensitivity. A second advantage of the use of 
globally amplified cDNA is that large amounts of DNA can be made, which can be 
readily and simply checked by methods such as gel electrophoresis and/or real-time 
quantitative PGR prior to and/or following incorporation of label. This provides 
advantages not only in terms of ease of production, but also in that it avoids the costs 
associated with inefficient labelling of target DNA molecules and ineffective use or 
wastage of arrays. 

Exonuclease digestion to produce collections of target DNA according to the invention 
may be performed using a suitable 3' or 5' exonuclease. 

Treatment of double-stranded DNA molecules (from which the collection of target 
molecules is derived) with exonuclease results in degradation from either tiie 3' or 5' end 
of each strand of DNA (depending on the specificity of tiie enzyme selected). Thus 
regions of each strand that are complementary to one another are removed by digestion. 
Digestion with double-stranded DNA (dsDNA) exonucleases will initiate digestion at 
each end of a double-stranded DNA molecule. Since the dsDNA exonuclease 
preferentially removes one strand of the double-stranded molecule digestion tends to be 
"self-limiting", and will decarease when there are no remaining regions of double-stranded 
DNA. Thus the exonuclease treatment can effectively convert each starting double- 
stranded DNA molecule into two non-complementary single-stranded DNA molecules 
corresponding to the 3' or 5' *Tialves" of the origmal molecule. 

Alternatively with knowledge of the avCTage size of molecules within tiie double-stranded 
DNA population (determined, for example, by gel electrophoresis) and the rate of 
digestion by the chosen exonuclease it is possible to chose an incubation period such that 
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the digestion removes a chosen length of the DNA molecxiles. This chosen length may, 
for example, be approximately half the average DNA molecule length present in the 
starting double-stranded DNA population. Such a digest will, as with the technique 
described above, produce two single-stranded DNA moleciiles corresponding to the (3' or 
5') "halves" of the lengths of the two starting strands of the original double-stranded 
DNA. 

It will be appreciated that as a result of the digestion the two remaining molecules are not 
complementary to one another. This therefore prevents the strands of target DNA re- 
hybridising to their complementary sequences found within the original double-stranded 
DNA populatioiL Thus the collection of target DNA molecules are maintained in single 
stranded form and are therefore firee to hybridise to complementary single-stranded DNA 
sequences in a reference sample to which they are exposed (should such sequences be 
present). This improves the sensitivity of the coUection of target DNA molecules when 
used in hybridisation-based assays. 

When a collection of labelled target DNA molecules accordmg to the invention is to be 
used in a hybridisation-based assay (e.g. to "probe" DNA molecules of reference samples) 
it may be preferred that the DNA molecules of the reference samples are also treated with 
exonuclease in order that they too may remain single-stranded. In this case it is important 
to note that the target DNA and reference DNA should be treated with exonucleases 
having complementary specificities. For example, in one embodiment the collection of 
labelled target DNA molecules may be produced using a 3'-5' exonuclease, and the 
reference samples treated with a 5'-3' exonuclease. In an alternative embodiment the 
coUection of labelled target DNA molecules may be produced using a 5'-3' exonuclease, 
and the reference samples treated with a 3'-5' exonuclease. 

Collectioivs of labelled target DNA molecules according to the invention may be used in a 
range of detection techniques that are based upon the hybridisation of complementary 
nucleic acids. An example of a technique m which labelled target DNA molecules of the 
invention may be used is provided by our co-pending U.K. patent application (entitled 
"Analysis of Biological Samples") filed concurrently herewith, which is described below. 
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A collection of labelled target DNA molecules according to the invention may be used in 
a metiiod of analysing a biological sample of interest, comprising: 

(i) providing a collection of labelled target DNA molecules (acting as a "probe 
library'') representative of a pattem of multiple gene expression in the biological sample 
of interest; 

(ii) providing a plurality of individual reference samples each being a library 
comprised of cDNA or a derivative thereof representative of a pattem of gene expression 
in reference biological samples from which the reference samples have been derived; 

(iii) treating individual reference samples with the collection of labelled target DNA 
molecules ("probe library") under hybridising conditions; and 

(iv) determining the relative degree of hybridisation of the collection of labelled target 
DNA molecules ("probe library^') to the reference samples, thereby providing an 
indication of the degree of similarity between gene expression in the biological sample of 
interest and gene expression in the individual reference biological samples. 

The method of analysis described above makes it possible to obtain an indication of the 
degree of similarity of the patton of gene expression in a sample of interest with that in a 
number of reference samples. The greater tiie degree of similarity of the pattem of gene 
expression in the sample of interest with a particular reference sample then the greater the 
similarity between these two samples. The ability to compare gene expression in a 
sample of interest with that in a large number of reference samples is, of course, 
particularly advantageous where the reference samples are of well characterised 
biological status since conclusions may then be dmwn as to the biological status of the 
sample of interest. 

A library for use according to the method of the invention is a collection of individual 
sequences representative of gene expression within the biological sample from which die 
library is derived. The niunber of sequences in the collection is sufficient to provide 
significant information about the biological activity or status of the biological sample 
from which the library is derived. Thus, although a biological sample may express many 
thousands of genes a library may, for instance, represent ten or more genes the e?q3ression 
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of which are characteristic of the activity or status of the biological sample. Preferably 
the probe library may represent twenty or more genes the expression of which are 
characteristic of the biological sample. Most preferably the probe library may represent 
fifty or more genes the expression of which are characteristic of the activity or status of 
biological sample. 

More particularly the metiiod of the invention utilises a probe library comprising cDNA, 
or a derivative thereof, representative of the mRNA content of, and hence a pattern of 
gene expression in, the sample of interest. The individual reference samples are each 
libraries comprised of cDNA molecules representative of gene expression in the 
biological samples from which the reference samples are derived. The libraries may, for 
example, comprise total cDNA. Examples of cDNA derivatives that may be employed in 
the method of the invention include sub-populations of total cDNA (e.g. obtained by 
complexity reduction techniques), and derivatives obtained by partial exonuclease 
digestion of the cDNA. The term cDNA derivative also includes RNA obtained fix>m any 
form of DNA that may be employed in the invention. 

The comparison of the patterns of gene expression in the sample of interest and tiie 
reference samples may be effected by probing tiie reference samples with the probe 
library under conditions allowing hybridisation of molecules in the two samples that are 
complementary to one another. It is preferred that the probe library is labelled for 
purposes of detecting hybridisation. If a particular gene is being expressed in both a 
sample of interest and a reference sample then the probe representing that gene will 
hybridise to the corresponding cDNA in the reference sample. This hybridisation may 
then be detected. The greater the level of hybridisation between the probe library and 
reference samples the greater the degree of sitnilarity in the patterns of gene expression in 
the samples from which they are derived. 

The ability to compare overall gene expression in a sample of interest with that in a 
number of reference samples is particularly advantageous vAion the reference samples are 
of defined and well-characterised biological status since conclusions may then be drawn 
as to the biological status of the sample of interest. 
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The method of the invention need only require a single round of hybridisation to allow 
comparison between the pattmi of egression of a plurality of genes in a sample of 
interest with the pattern of expression of the same genes in a number of reference 
samples. The pattern of expression may potentially extend to the expression of thousands 
of different genes. Since known techniques only analyse the expression of either small 
numbers of genes or smaU numbers of samples such information could only be provided 
by the prior art on completing multiple rounds of hybridisation. Thus the method of the 
invention provides advantages both in terms of a reduction in the time necessary to 
perform such a comparison, and also in the reduced amount of reagents required. 

In contrast to existing methodologies (in which specific probes are used to investigate the 
expression of specific genes) tiie method of the invention is able to compare patterns of 
gene expression without requiring any specific information as to the genes involved. Thus 
it is not necessary to identify those genes tiiat may be of interest before comparing 
pattems of gene expression between unknown and reference samples. This provides a 
considerable advantage over &e prior art in that an investigator does not need to know 
what gene (or genes) are involved in, for example, a particular response to a therapeutic 
agent before he can establish whether a test subject is likely to respond in a similar way to 
a previously characterised subject with a known response. 

The reference samples may be derived from biological reference samples representing a 
number of different biological conditions or states. Alternatively the reference samples 
may be derived from biological, reference samples representing a number of different 
examples of the same biological condition or state. Individual reference samples may be 
derived from biological samples taken from one or more individual. In the instance that a 
reference sample is derived from a single individual tiie reference sample may be derived 
from a biological sample representing a single tissue, or fix)m biological samples* 
representing a number of different tissues. In the case of reference samples derived from 
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biological s^nples taken fiom more 4an one individual fte biological sa».ples may all 
represent one W> of tissue, or may represent a number of different tissue types. By 
including reference samples «hi<l> *an> a common biolopcal phenotj^e yet have arrived 
at that state via different routes the mefl«Kl of the invention is d.le to discrimmate 

between treatment and biolo^cal status. 

Examples of materials that may be used as reference samples include samples vAich are 
d«ived ftom patients v«lh knovm clinical conditions and/or «ifh kno™ clinical 
outcomes. 

m one such example reference samples may be taken from a number of patients with 
different forms of a particular disease, m this instance the sample of interest may be 
taken ftom an individual suspected of having, or being predisposed to, the disease m 
question. By comparing the pattern of gene expression in the sample of interest wrth the 
patterns of gene expression in the reference samples it is possible to estabUsh which of the 
reference samples the sample of interest most closely resembles. This may then in turn 
provide an indication as to the particular form of the disease in question that the 
individual tested has or is predisposed to. 

Alternatively the reference samples may be derived from patients with the same disease, 
but having different (known) reactions to different therapeutic agents. In Ihis case 
comparison of the sample of interest and the reference samples will estabUsh which of the 
patients with known treatment history the individual providing the sample of interest most 
resembles. This knowledge can then be used in order to select the treatment regnne 
believed most likely to produce a beneficial outcome for the individual in question. 

m another alternative reference samples may be derived from the same patient at different 
times, for mstance before, during and after therapy. Comparison of such samples with a 
sample of interest taken from a patient with the same disease may be useful in assessmg 
the progress of the patient of interest during treatment 
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In a fijrther alternative suitable reference samples may be collected fiom experimental 
subjects, such as animals or cultured ceUs, that have undergone procedures the effects of 
which are well studied. 

For example reference samples may be coUected from cells of a ceU line that have been 
exposed to different drugs that have knoAvn effects (dther on the cell Ime or on organisms 
of interest). These samples may then be probed using a probe library derived fix)m cells 
of the same ceU line that have been exposed to a putative drug which has an unknown 
effect. By comparing gene expression patterns estabUshed in response to the known and 
unknown drugs it is then possible to establish which of the known drugs the unknown 
drug most resembles. This will provide an indication that the effects of the unknown drug 
are likely to be sunilar to those of the known drug that it most closely resembles. 

hi a further example the sample of interest may be taken fiom tissues of experimental 
anknals that have undergone treatments bringmg about conditions that resemble those of a 
disease of interest. The pattern of gene expression m these samples may then be 
compared wi& the pattern of gbne expression in reference samples taken fiom normal 
biological samples or biological samples fiom individuals suffering fiom the disease m 
question in order to investigate how changed gene expression influences the particular 
disease. Suitable experimental animals may. for mstance, include transgenic animals, 
such as ammals in which certain genes have been up-regulated, down-regulated or 
deleted. 

to another ^pUcation of the method of the mvention the sample of interest may be taken 
from a tissue that mcludes, or may be thought to include, a ceU type of particular mterest. 
Such cells may, for example, be stem or progenitor ceUs. In this case tissues representing 
suitable biological samples fiom which reference samples may be derived will include 
tissues known to contain the ceU type of interest, or tissues known to contain specific 
forms of the ceU type of interest Comparison of the pattern of gene expression in the 
sample of mterest with the pattern of gene expression m the reference samples may 
indicate that the sample of hxlerest either does or does not contain the ceU type of interest. 
If the sample of mterest contains cells of the cell type of mterest Ihen the method of the 
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invention noiay also provide infoimation as to the number, form or status of these cells 
present in the sample. 

In the field of stem cell biology defining the specific gene expression changes in stem 
cells, tiieir immediate daughter cells, cells committed to diflFerentiation and fiilly 
differentiated cells under conditions that alter self-renewal and differentiation represents a 
powerful means of identifying potential drug targets. For example, the discovery of a 
growth factor or growth factor receptor specifically expressed in stem cells xmdergoing 
increased self-renewal would lead to the development of pharmacological approaches 
designed to inhibit stem cell expansion during cancer development or increase stem cell 
expansion following injury. Furthermore, identification of genes whose expression is 
specifically linked to eventual stem cell self-renewal and differentiation will greatly 
facilitate the monitoring of stem cell behaviour that is an essential component of pre- 
clinical dmg evaluation. 

Since the method of the invention compares the patterns of ejgpression of a number of 
genes within test and reference samples it is particularly well suited to the study and 
comparison of biological activities associated with stem cells that involve the interplay of 
a number of different genes, for example in a biochemical pathway. Using known 
methods to investigate such interactions it would be necessary to identify each gene 
involved in a pathway and conduct separate hybridisation reactions to determine the 
expression of each gene. The method of the invention, in contrast will in a single round 
of hybridisation report on the comparative expression of all genes involved, even 
including genes that may not be known to be associated with the pathway. 

Conveniently the method of tiie invention may be effected using an array or microarmy 
on a solid substrate. Either the probe library or flie reference samples may be provided as 
an array or microarray on a suitable substrate. Preferably it is the reference samples that 
are provided as the array on the substrate. In a particularly preferred embodiment an 
array or microarray may comprise a library of referaice samples containing DNA samples 
derived firom groups representative of different biological conditions, each group 
containing samples derived firom a number of different individuals sharing the same 
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condition, wherein the DNA samples are arranged in order on the array or microarray 
such that members of the same group are located in proximity to one another. The DNA 
samples may, for instance, be arranged m order in a grid pattern such that each row of the 
grid represents a group of mdividuals sharing the same biological condition. 

A suitable microarray may, for instance, be produced on a substrate such as glass, a silica- 
based chip, a nylon membrane or a microtiter plate. Many examples of techniques 
suitable for the manufacture of arrays or microarrays will be readily apparent to those 
skilled in the art. These include the techniques disclosed in Maniatis et al. 1982, Chee et 
al, 1996, Iyer et al. 1999, Lipshutz et al. 1995, Lockhart et al. 1995, Schena 1996, Schena 
et al. 1995, Soares et al. 1997 and Southem 1996. 

DNA may be coupled to the siq)port material forming tiie array by means of electrostatic 
interaction with a coating film of a polycationic polymer such as poly-L-lysine (as 
described in WO 95/35505) or may be covalently bound to the support by well 
established techniques. 

In an alternative embodiment the method of the invention may, if preferred, be performed 
with both the probe library and reference samples in firee solution. 

Biological samples suitable for use according to the invention include any sample 
containing material representative of gene expression in the sample, such as mKNA. 
Biological samples preferably comprise biological cells, indeed a suitable biological 
sample may even comprise a single cell. Suitable samples may be taken by means of 
biopsies, swabs, hair or skin samples, or as samples of bodily fluids such as blood, 
cerebrospinal fluid (CSF), saliva, milk, faeces and urine. In particular samples for use in 
analysis of stem cells may suitably be taken firom foetal or embryonic tissue, from bone 
marrow or from germ cells or from other tissues in the adult or developing organism. 

The probe library and the reference samples may preferably comprise, or be derived from, 
global amplified cDNA. 
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Materials that may be derived from total cDNA include sub-populations of the total 
cDNA, truncated or otherwise manipulated versions of the cDNA and other materials 
representative of patterns of gene expression produced using the total cDNA as a 
template. 

The invention may be effected using probe libraries and/or reference samples comprising 
cDNA produced as described above without further modification. However, various 
modifications may be inade vAmt will improve the sensitivity of the method. 

Complexity reduction techniques may also be used in preparation of the probe library 
and/or reference samples to improve the sensitivity of the method of the invention. The 
rationale behind such techniques is that many of the mRNAs present in a biological 
sample, such as the sample of interest or the reference samples, represent transcription of 
so-called "house keeping" genes. These genes encode products associated with the up- 
keep of the cell and are generally likely to be common to both samples of interest and 
reference samples. As such they represent components of gene expression patterns that 
may be found in both test and reference samples, but which are unlikely to be important 
in tiie development or maintenance of a biological condition or state of interest. It has 
been estimated that up to 65% of mRNA mass within cells may be composed of 
transoipts representing "house-keeping" genes. 

Complexity reduction techniques improve sensitivity either by simply reducing the 
number of individual genes represented in the probe library and/or reference samples, or 
by specifically removing irrelevant or "house keeping" genes from the probe library 
and/or reference samples. Thus the relative abundance of those molecules representative 
of gene expression that remain after application of a complexity reduction technique is 
increased, thereby increasing the "signal to noise" ratio. 

A number of complexity reduction techniques may be used in effecting the method of the 
invention. These techniques may be used in isolation or in combination. Preferably the 
same complexity reduction technique, or combination of complexity reduction 
techniques, are used to treat the cDNA, or its derivatives, to produce both the probe 
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library and the reference samples, although it is possible to apply, complexity reduction 
techniques to only one of the DNA populations. 

Suitable examples of complexity reduction techniques include: 

Restriction enzyme based 

In this complexity reduction technique site specific endonucleases are used to digest the 
cDNA or its derivatives. Since the frequency of cleavage sites for any specific 
endonuclease will depend on the size and base composition of the cleavage site 
endonucleases can be chosen that will cut a sub-set of all DNA molecules present. For 
example, a restriction endonuclease recognising a six base site will, on average, cleave 
every 4,096 base pairs. Thus in a DNA population in which the average polynucleotide 
size is 2,000 bases such a endonuclease will cleave approxunately half of all 
polynucleotides present. Following restriction digestion either the cleaved products or the 
uncleaved products can be selectively enriched. By choosing the appropriate restriction 
enzymes distinct subsets of cDNA sequences can be either eliminated or enriched. By 
applying this type of strategy die initial total cDNA sample can divided mto subsets of 
genes whereby each sequence is effectively enriched making it more likely that changes 
in each individual gene will be detected during array hybridisation. ' 

Thus for individual gene sequence present after applying complexity reduction there will 
be an increase in specific activity for each gene and an increase in the "signal to noise''. 

Display products. 

Another means of selecting a subset of sequences present in the starting cDNA/mRNA 
popxilation, and thereby increasing the relative abundance of each selected sequence after 
complexity reduction, is the use of approaches for differential cDNA display (Liang and 
Pardee, 1992). cDNA display selectively amplifies only those cDNA populations which 
contain effective priming sites for display primer(s) used. Display primers can be used to 
prepare distinct subsets of cDNAs directly from starting KNA (Liang and Pardee, 1992) 
or alternatively display amplification may be applied to amplified total cDNA 
populations (Candeliere et al., 1999). In essence display techniques reduce complexity by 
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selectively enriching a subset of the sequences present in the original mKNA or cDNA 
population, thereby increasing the relative abundance of the selected sequences within the 
resultant population. 

Hybridisation depletion and enrichment, 

A variety of DNA/RNA subtraction techniques have been developed to deplete 
DNA/RNA sequences common to two or more pools of DNA/RNA molecules. 
DNA/RNA subtraction applied to DNA or RNA copies (either direct copies or amplified 
products) of the original extracted RNA can be used to reduce complexity by removing 
sequences. 

Suitable DNA/RNA subtraction techniques for use according to the invention are well 
known. One such method involves the production of a single-stranded cDNA library (the 
'tracer"), such as the cDNA &om which the probe library or reference samples are to be 
generated, firom which it is desired to remove certain sequences. A collection of 
amplified cDNAs represoiting the sequences that one wishes to eliminate (the "drivef)* 
such as housekeeping genes, is then allowed to hybridise with the tracer. Double stranded 
DNA molecules, representing hybrids of the tracer and the driver, may then be removed 
firom tiie total population of DNA based upon their adhesion to hydroxyapatite. The 
remaining DNA population comprises single stranded DNA molecules representing the 
tracer population minus the driver population. This subtracted DNA population may then 
be further amplified as required. 

In further refinements of this method "driver" nucleic acids may be covalentiy linked to 
compounds which facilitate the physical separation of "driver" nucleic acids (plus any 
aimealed "tracer") from unhybridised "tracer". The separated populations (i.e. those 
sequences present only in tiie "tracer", or those sequences shared by both "tracer" and 
"driver^') may then be enriched or depleted relative to one anotiier. For example, driver 
nucleic acids may be Imked to biotin, such that following hybridization all biotinylated 
hybrids can by segregated by interaction with immobilised avidm, allowing either 
subtractive enrichment or positive selection. Suitable protocols are described in Welcher 
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et al., 1986; and Weaver et al., 1999. In alternative, but similar, approaches "driver" 
nucleic acids may be bound to latex beads (as described in Kuribayashi-Ohta et al.. 1993, 
or magnetic particles (as described in Lopez>-Femandez and del Mazo, 1993; and Schraml 
etal. 1993. 

In one embodiment hybridisation depletion/enrichment protocols can be used to remove 
"unwanted sequences" present in samples ftom \vhich the probe Hbrary and/or reference 
samples are derived. The nature of the "unwanted sequences" will depend on the 
biological samples in question. However, as a general rule, sequences which are 
expressed at similar levels in diverse samples are, by their very nature, uninformative and 
tend simply to add to the "background" produced during hybridisation. 

It is likely that genes expressed at a sunilar level in biologically divergent tissues wiU not 
be characteristic of a particular tissue, and wiU instead represent house-keepmg genes. By 
way of example, it is unlikely tiiat genes expressed at a similar level m tissues as 
biologically different as heart, lung, spleen and testes will be characteristic of any one of 
tiiese tissues. Sequential hybridisation enrichment can be used to obtam a "pool" of 
sequences common to different tissues. The resultant pool will represent genes tiiat 
contribute to the "background noise" associated witii hybridisation. This pool can tiien be 
expanded and used to reduce the level of background hybridisation. For example, it is 
possible to subtract these common sequences from both the probe Ubrary and reference 
samples, thereby reducing the level of total hybridisation. Alternatively it is possible to 
use tiie pool of common genes to produce unlabelled competitor DNA and tiiereby reduce 
the level of detectable hybridisation. 

Using probe libraries and reference samples produced m accordance witii tiie techniques 
described above the mefliod of the invention may be effected by reference samples and 
probe library under hybridismg conditions. The conditions under which nucleic acids will 
hybridise to one another are well known to tiiose skilled m tiie art. Specific conditions are 
described m greater detail in tiie accompanying Example. Further examples of conditions 
suitable for nucldc acid hybridisation can be found m reference works such as "Molecular 
Cloning: A Laboratory Manual" edited by Maniatis et al.. CMher suitable conditions are 



20 



described in Chee et al. 1996. Iyer et al. 1999. Lipshutz et al. 1995, Lockhart et al. 1995. 
Schena 1996. Schena et al. 1995. Soares et al. 1997 and Southern 1996. 

Similarly, methods for determining the relative degree of hybridisation between 
populations of nucleic acids are also weU known. Methods suitable for effecting the 
invention include labelling of the probe library with reporters such as fluorescent labels, 
radioactive labels or chromogenic enzymes. If the reference sample Ubranes are 
unlabelled then detection of the chosen label (after removing unbound probe) wUl confirm 
the presence of hybridisation between the sample of interest and the reference sample. 
Suitable techniques for labelling of the molecules comprising the probe hbrary, for 
detection of hybridised probe and reference DNA molecules and for interpretation of 
hybridisation data axe well known to those skilled in the art. These techniques include 
those described in Maniatis et al. 1982. Chee et al. 1996, Iyer et al. 1999, Lipshutz et al. 
1995. Lockhart et al. 1995, Schena 1996, Schena et al. 1995. Soares et al. 1997 and 
Souifliem 1996. 



Use ofunlabelled competitor DNA. 

When the probe library DNA is labelled and flie reference sample DNA is unlabelled the 
sensitivity of the method of the invention may be improved by the use of unlabelled 
"competitor" DNA which can compete with the DNA of the probe Ubrary for 
hybridisation with the reference samples. The competitor DNA may be DNA 
representing common housekeeping genes, or it may be selected DNA representing genes 
common to the biological sample of interest and/or the reference samples. Smce the 
competitor DNA is unlabelled, hybrids of competitor and reference DNA will not be 
detected in assessing total hybridisation. 

The competitor DNA may be exposed to the reference sample DNA before the addition of 
the probe library DNA or at the same time as the addition of the probe Ubrary DNA. 
Molecules of the competitor DNA that represent genes expressed by the reference 
samples will then hybridise to the corresponding DNA of the reference samples. 
Reference sample molecules that undergo hybridisation with molecules of the competitor 
DNA wiU therefore be unable to hybridise with further molecules fix>m the probe hbrary. 
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Thus by incubating the DNA of the reference samples with, for example, unlabelled 
competitor DNA representative of housekeeping genes it is possible to reduce the level of 
binding by labelled probe DNA representing the same genes. This therefore improves the 
sensitivity of the method of the invention since it increases the likelihood that detected 
hybridisation is representative of genes of interest ivithm the samples. 

Unlabelled competitor DNA representative of genes having a high frequency of 
expression within the biological sample of interest and/or reference samples may be 
generated by reverse subtraction of the DNA populations derived from the two samples. 

The present invention will now be illustrated by way of example only with reference to 
the accompanying drawing, illustrating use of the collection of labelled target DNA 
molecules as a "probe library" in the method described above, in which: 

Figure la represents a schematic depiction of an array of reference samples suitable for 
use in the method of the invention before effecting hybridisation; 

Figure lb represents the same array after effecting hybridisation of a probe library with 
the reference samples; 

Figure Ic represents a flow chart indicating suitable methods for producing a probe 
library and reference samples according to the invention; and 

Figure la shows an array (1) provided with individvial reference samples (2) derived from 
cDNA generated from biological reference samples. Each individual reference sample is 
a library representative of a pattern of gene expression in the biological reference sample. 
The rows of reference samples (2) on the array (1) each represent a distinct biological 
condition or state. Each reference sample (2) within a row is derived from a different 
individual sharing the same biological condition or state. 

Figure lb shows the results of probing the refi^ence samples (2) on the array (1) with a 
labelled probe library according to the method of the invention. Sequences present within 
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both the probe library and the reference samples (2) hybridise to one another. 
Hybridisation is measured by colour development, hence the greater the degree of 
hybridisation between the probe Ubrary and a reference sample the more intense the 
colour, -nius in Figure 2a it can be seen that the probe Ubrary exhibits the greatest degree 
of similarity (and so hybridisation) with the reference samples of row 10. a lesser degree 
of simUarity (and hybridisation) with reference samples of rows 3 and 6 and a still lesser 
degree of similarity with the reference samples of rows 1 and 8. The probe Ubrary does 
not share any sequences in common with the other rows of reference samples (2) and thus 
does not hybridise wilh these reference samples, so producing no colour development 

Hie probe Ubrary and reference samples may be prepared by the procedures illustrated in 
Figure Ic in whidi RNA (3) ftom a biological sample of interest is amplified according 
to known protocols to generate global cDNA (4). IHis global cDNA (4) may then be 
used directly to produce a probe Ubrary (as mdicated by arrow 5) or. more prefembly, is 
subjected to complexity reduction techniques (6) prior to probe Ubrary production. 

Complexity reduction (6) may. for instance, take the form of processing to display 
products (7), subtraction of unwanted sequences from the global cDNA generated from 
the sample of interest (8) or restriction digest of sequences in the cDNA generated fit.m 
the sample of interest (9). The cDNA generated from the sample of interest may be 
subject to a combination of complexity reduction techniques (e.g. subtraction (8) and 
restriction digest (9)) or tmy be used to produce a probe Ubrary after a single complexity 
reduction technique. 

OptionaUy. the cDNA. or derivative, of the probe library may be subject to exonuclease 
digestion in order to improve the sensitivity of the invention. This digestion may be 
effected either before or after complexity reduction. 

Production of the probe Ubrary is completed by using known techniques to label (10) the 
cDNA generated from the sample of interest 
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Although the generation and processing of cDNA has been described above with 
reference to production of the probe library, the techniques described (with the exception 
of labelling the probe library) are all equally suitable for production of reference samples 
from biological reference samples in order to produce a suitable array (11). Preferably 
both the probe library and reference samples to be used according to the me&od of the 
invention are produced using the same complexily reduction techniques. In the situation 
that both the probe library and reference samples are to be subject to exonuclease 
digestion the two different cDNA populations should be treated with exonucleases having 
different specificities, i.e. one treated with a 5' to 3' exonuclease, and the other treated 
with a 3' to 5* exonuclease. 
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Protocols. 

The following Protocols are suitable for effecting the method described above using a 
collection of labelled target DNA molecules according to the invention as a "probe 
Ubrary". 

(a) Preparation of global amplified cDNA 

(i) Preparation of cDNA 

(ii) Terminal transferase - "Tailing" 

(iii) Global cDNA amplification 

(b) Preparation of array of reference samples 

(c) Labelling of probe library 

(i) Terminal Transferase labelling 

(ii) PGR labelling 

(d) Exonuclease treatment of double-stranded DNA 

(e) Hybridisation of probe library and reference samples 

(f) Detection of hybridisation 
• (g) Complexity reduction. 

(i) Display Based 

(ii) Hybridisation depletion and enrichment 



fa) Preparation of slobal amvlified cDNA. 

The protocol described below is based on protocols described in Brady et al. (1990) and 
Brady, G., and Iscove, N. N. (1993). 

Suitable starting materials include total RNAs, which may be prepared from biological 
tissues of interest (using commercially available kits such as those manufactured by 
Clontech), or mRNA present in biological cells ("direct amplification"). 

ffl Preparation of cDNA, 

cDNA may be prepared from the mRNA from the biological tissues according to the 
following protocol: 

1 . RNAs are adjusted to 100 microgram/ml in 10 mM Tris pH 7.5, 1 mM EDTA 

2. 3 fjl of each RNA is added to 3 pi of the following buffer: 
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100 mM TrispH8.3 

150 mM KCl 

6 mM MgCla 

0.2 mg/ml Glycogen (TRoche) 

2 % NP-40 (Roche) 

2.5 nM dNTPs (Sigma) 

0.75 \iM dT24 (Sigma/Genosys) 

0.37 u/ml RNAse inhibitors (Ambion) 



3. Samples are heated to 65*'C for 1 minute allowed to cool at RT for 3 minutes then 
placed on wet ice 

4. After 1 to 10 minutes on ice 3 pi of tiie following buffer containing 85 u MMLV 
RTase and 1 u AMV RTase is added to each sample: 

50 mM Tris pH 8.3 

75 mM KCl 
3 mM MgCb 

0. 1 mg/ml Glycogen (Roche) 
1 % NP-40 (Roche) 

5. Samples are Incubated 15 minutes at 37*^0, heat inactivated at 65''C for 10 minutes 
then cooled to 4''C. 

fiil Terminal Transferase -- 'Tailing* 

1. 5 pi of each sample is mixed with 5 |xl of the following buffer containing 2.3 \uuts 
terminal transferase. 



200 mM potassium cacodylate pH 7.2 

4 mM C0CI2 

0.4 mM DTT 

ImM dATP 



2. Samples are then incubated 15 minutes at 37oC, 65oC 10 minutes and cooled to 
4oC. 

(iii) Global cDNA amplification. 

Global cDNA pr^ared from biological tissues according to the preceding protocols may 
be amplified according to the following protocol: 

1 . 8 ^1 of the tailed cDNA prepared as described above may be combined with 8 ^1 
of: 

121.4 mM KCl 

8.5 mM MgCl2 

24.25 mM Tris-HCl pH 8.3 
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48 fig/ml 
2.4 % 
23 mM 
9-6 



Glycogen (Roche) 
Triton X-100 
dNTPs 

Oligo NotldT (sequence 

CATCTCGAGCGGCCGCTTTTTTTTTTT^ 

Taq Polymerase 



ai6u/|al 



2. 



Samples aie then placed into a PGR machine and subjected to: 



25 cycles 

1 minute 94°C 

2 minute 42''C 
6 minute ll^'C 

followed by an additional 25 cycles: 
1 minute 94°C 

1 minute 42^0 

2 minute 72°C 

3. Following completion of PGR samples are purified using the Millipore 96 well 
purification system (Millipore MANU 03050) following instructions provided by the 
manufacturer. 

(b) Preparation of array of reference samples. 

An array comprising global purified cDNA (prepared as described above) may be 
produced using the following protocol: 

Purified global cDNAs firom heart, Ivmg, spleen and testes may separately be adjusted to 
around 50 ng/|il in 50% DMSO, boiled and spotted in groups of 12 onto CMTGAPS glass 
slides (Coming) using a Gene Machines OmniGrid as reconamended by the manufacturer. 

fcl Labelling of probe library. 

The following provides suitable protocols for labelling of probe library cDNA for use 
according to the method of the invention. The following protocols describes the labelling 
of two different cDNA populations (which may be prepared using the protocols described 
above) with two different fluorescent markers (Cy3 and Cy5). 

ffl Terminal Transferase labelling, 

1. Approximately 50 ng of globally amplified cDNA of a first probe library may be 
added to a 20 \j1 reaction containing: 

100 nM FluoroLink™ Cy3-dUTP (Amersham Pharmacia Biotech) 

1 00 mM potassium cacodylate pH 7.2 

2 mM C0CI2 

0.2 mM DTT 



total 5 units 
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Terminal Transferase 



2. Approximately 50 ng of globally amplified cDNA of a second probe library may 
be added to a 20 |al reaction containing: 

100 nM FluoroLink™ Cy5-dUTP (Amersham Pharmacia Biotech) 

1 00 mM potassium cacodylate pH 7.2 

2 mM C0CI2 

0,2 mM DTT 

total 5 xmits Terminal Transferase 

3. Following incubation for 1 hour at 37^C both samples may be eflianol precipitated 
by the addition of: 

10 |j.l 7.5 M Ammonium Acetate 

0.5 \xl 15 mg/ml Glyco Blue (Ambion) 

75 |nl ethanol 

Samples may then be held on wet ice for 15minutes, centrifuged at 4^C at 14,000 ipm for 
20 minutes and the pellets washed twice with 70% ethanol, allowed to dry 10 minutes at 
room temperature then resuspended in 5 jil 10 mM Hepes pH 7.5, 1 mM EDTA. 

fin PCRlabelMng, 

Further rounds of PGR amplification can be used to incorporate fluorescent markers 
directly or indirectiy coupled to nucleotides present in the PGR reaction. An example of 
such an approach is given below. 

L Approximately 0.5 ng of globally amplified cDNA of a first probe library may be 
added to a 20-100 ^1 reaction containing: 



1 00 nM FluoroLmk™ Gy3-dUTP (Amersham Pharmacia Biotech) 

lOOnM dNTPs 

1 pM Oligo NotldT (sequence 

GATGTCGAGGGGGGGGTTTTTTTTTTTTTTTTTTTTTTTT) 
16mM (NH4)2S04 
67mM Tris-HCl (pH 8.8 at 25X) 

0.01% Tween-20 
0.16 u/|xl Taq Polymerase 

2. Approximately 0.5 ng of globally amplified cDNA of a second probe library may 
be added to a 20-100 \xl reaction containing: 

1 00 nM FluoroLink™ Cy5-dUTP (Amersham Pharmacia Biotech) 

lOOnM dNTPs 



28 

1 |iM Oligo NotldT (sequence 

CATCTCGAGCGGCCGCTTTTTTTTTT^^ 
16mM (NH4)2S04 
67mM Tris-HCl (pH 8.8 at 25°C) 

1.5 mM MgCl2 
0.01% Tween-20 
0.16 u/|xl Taq Polymerase 



3. Both samples are then placed into a PGR machine and subjected to: 



25 cycles 

30 seconds 94''C 

1 minute 42°C 

2 minutes 72°C 



4. Following both samples may be ethanol precipitated by the addition of: 

O.S original sample volume 7.5 M Ammonium Acetate 

0. 025 original sample volimae 15 mg/ml Glyco Blue (Ambion) 
3.5 original sample volumes ethanol 

Samples may then be held on wet ice for 15minutes, centrifuged at 4°C at 14,000 rpm for 
20 minutes and the pellets washed twice with 70% ethanol, allowed to dry 10 minutes at 
room temperature then resuspended in 5 |j.l 10 mM Hejpes pH 7.5, 1 mM EDTA. 

fd> Exonuclease treatment of double-stranded DNA. 

Note exonuclease treatment can be applied to either jfreshly amplified cDNA or labelled 
cDNA. 

ffl 3*"5*-exonuclease - Exonuclease in 

1 . Add freshly 0.5 - 5 p,g cDNA to a 50 ^1 reaction consisting of: 



660mM TrispH8.0 

6.6mM MgCl2 

5mM DTT 

50 \xg/wl BSA 

1 0 units Exonuclease m (Amersham Pharmacia) 



2. Incubate 30 minutes at 37^C. 

3. Following heat inactivation at 75°C for 30 minutes ethanol precipitate by the 
addition of: 

0.5 original sample volimie 7.5 M Ammonium Acetate 

0.025 original sample volimie 15 mg/ml Glyco Blue (Ambion) 

3.5 original sample volumes ethanol 
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Samples may then be held on wet ice for ISminutes, centrifuged at 4^C at 14,000 ipm for 
20 mmutes and the pellets washed twice with 70% ethanol, allowed to dry 10 minutes at 
room temperature then resuspended in S ^1 10 mM Hepes pH 7.5, 1 mM EDTA. 

(ift 5'"3**exonuclease - T7 Gene 6 Exonuclease 

1 . Add freshly 0.5 - 5 |ag cDNA to a 50 ^1 reaction consisting of: 

40 mM TrispH7.5 
20 mM MgCl2 
50 mM NaCl 

10 xinits T7 Gene 6 Exonuclease (Amersham Pharmacia) 
3. Incubate 30 minutes at 37°C. 

3. Following heat inactivation at 75^C for 30 minutes ethanol precipitate by the 
addition of: 

0.5 original sample volume 7.5 M Ammonium Acetate 

0.025 original sample volume 15 mg/ml Glyco Blue (Ambion) 

3.5 original sample volumes ethanol 

Samples may then be held on wet ice for 15minutes, centrifuged at 4°C at 14,000 rpm for 
20 minutes and the pellets washed twice with 70% ethanol, allowed to dry 10 minutes at 
room temperature then resuspended in 5 jil 10 mM Hepes pH 7.5, 1 mM EDTA. 



(e^ Hybridisation of probe library and reference samples. 

Hybridisation of probe library and reference samples according to the method of the 
invention may be effected as follows, using an array and probe libraries prepared as 
described above. 

1 . An array slide may be prehybridised at 42**C for 1 hour in the following buffer: 

50% Formamide 
5X SSC 
0.1 %SDS 
10 mg/ml BSA 

2. The array slide may then be washed four times with H2O and once in Isopropanol 
and dried 5 minutes at room temperature. 

3 . The following mixture may then be prepared: 



50%v/v Formamide 
5X SSC 
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SDS 

PolyARNA 
Yeast tRNA 

Salmon Sperm DNA (10-30ug) 
Cotl DNA 



0.1 % 
0.5mg/ml 
O.Smg/ml 
0.5mg/ml 
50ug/ml 
combined Cy3 and Cy 5 probes 

(Total volume 45 fil) 



4. The mixture may then be heated at 95*^C for 5 minutes and chilled on wet ice 3 
minutes. 



5. The mixture may be applied to a cover slip and the pre-warmed (42®C) array slide 
(arrayed materi^ facing downwards) lowered onto cover slip to the point when it 
is just possible to lift tihie cover slip up with surfoce tension. 

6. The slide may be placed into a moisturised slide hybridisation chamber and 
incubated 42 ""C o/n.(<16hr). 

7. Following hybridisation the entire slide may be immersed in 2X SSC and the 
cover slip removed. 



8. The ejqposed slide may then be washed twice 2X SSC/0.1% SDS (5 minutes RT 
each wash) followed by 2 washes witib 2X SSC (5 minutes RT each wash) and drying at 
room temperature. 



f ft Detection of hybridisation. 



The following protocol is suitable for detection and analysis of hybridisation in the 
method of the invention. 



1. Scaiming of the slide and qiiantification of red (Cy5 635nm) and green (Cy3 
S32nm) fluorescence may be carried out using a GenePix 4000b as recommended 
by the manu&cturer. 



2. Following scanning data may be analysed using conomercially available software. 



fg^ Complexity reduction. 
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There are many possible complexity reduction techniques that are suitable for use vsdth the 
method of the invention. 

(ft Display based 

The following protocol is suitable for effecting a "display products" complexity reduction 
technique accordmg to the method of the invention. The protocol provides for the 
preparation of two different amplified cDNA populations firom the same cDNA 
population ("total cDNA"). 

Selected subsets of cDNA within a global amplified total cDNA population may be 
further amplified based on protocols described in: 

Candeliere, G. A., Rao, Y., Floh, A., Sandler, S. D,, and Aubm, J. E. (1999). cDNA 
fingerprinting of osteoprogenitor cells to isolate differentiation stage-specific genes. 
Nucleic Acids Research 27, 1079-83. 

A suitable protocol is as follows: 

1. Purified globally amplified total cDNA prepared as described above may be 
diluted 100 fold m 2 mM Tris pH 7.5, 0.2 mM EDTA. 

2. Two separate subsets of cDNAs may then be selectively amplified firom the total 
cDNA by separately adding 10 |xl of total cDNA to 10 jil of PGR mixture A and 
10 \il of total cDNA to 10 \xl of PGR mixtures, and subjecting both to: 

2 cycles as follows: 
94°G 1 minutes; 
35°G 3 minutes; 
72'*G 3 minutes; 
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followed by 30 cycles as follows: 
94°C 30 seconds; 
50°C 30 seconds; 
72°C 1 minute; and 

1 cycle as follows: 
72**C 5 minutes. 

PCJi mivtiire A 

25 ]xU Display OUgo A - CAGCCAGTCTTGAGGCAACACC 

0.5 mM dNTPs (Sigma) 

32 mM (NH4)2S04 

1 34 mM Tris-HCl (pH 8.8 at 25"C) 

0.01% Tween-20 

3 mM MgCl2 

25 u/ml Taq Polymerase 

PGR mixture B 



25 ^M Display Oligo B - CCAGCAAGAGCACAAGAGGAAGAG 

0.5 mM dNTPs (Sigma) 

32 mM (NH4)2S04 

134 mM Tris-HCl (pH 8.8 at 25*0) 

0.01% Tween-20 

3 mM MgCli 

25 u/ml Taq Polymerase 

Following PGR all samples may be purified using GFX purification columns (Amersham 
Pharmacia) following the manufacturer's instructions. 



OS) Hybridisation depletion and enrichment 
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The term driver refers to the cDNA used to deplete and/or enrich in the tracer cDNA 
population. The resultant depleted or enriched sequences will be derived from the tracer 
cDNA population. In the following examples all driver cDNAs are prepared in PGR 
reactions containing dUTP (not dTTP) to allow removal of residual driver cDNAs using 
the dUTP specific UNG nuclease. 

Based on methods described in: 

Analysis of gene-expression in a complex differentiation hierarchy by global 
amplification of cdna from single cells, Brady, G, Billia F, Knox J, Hoang T, Kirsch IR, 
Voura EB, Hawley RG, Gumming R, Buchwald M, Siminovitch K, Miyamoto N, 
Boehmelt G, and Iscove NN: Current Biology 1995, 5: 909-922. 

Foot, HGG, Brady G, and Franklin FGH. (1996). Subtractive Hybridisation. In Plant 
Molecular Biology Laboratory Manual, M. Clark, ed. (London: Springer Veriag). 

Weaver, DL, NMez G, Brunet C, Bostock V, and Brady G. (1999). Single-cell RT-PCR 
cDNA subtraction. In Molecular Embryology: Mefliods and Protocols., P. Sharpe and 1. 
Mason, eds. (Totowa, NJ, USA: Humana Press), pp. 601-609. 

Depletion/Subtraction 

1 . Preparation of tracer and driver: 



Approximately 0.5 ng of globally amplified cDNA added to a 20-100 \il reaction 



Tracer 



containing: 



250 nM 
1 \iM 



16mM 
67mM 
L5mM 
0.01% 



0.16 u/pl 



dATP, dTTP, dGTP, dGTP 

Oligo NotldT (sequence 

GATGTGGAGCGGGGGGTTTTTTTTTTTTTTTTTTTTTTTl^ 

(NH4)2S04 

Tris-HGl (pH 8.8 at 25^C) 

MgGl2 

Tween-20 

Taq Polymerase 



Driver 

Approximately 0.5 ng of globally amplified cDNA added to a 20-100 p-l reaction 
containing: 
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250 nM 
1 ]iM 



dATP, dUTP, dCTP, dGTP 

Oligo NotldT (sequence 

CATCTCGAGCGGCCGClliril'llllll'ri-ll'rillll'll) 

(NH4)2S04 

Tris-HCl (pH 8.8 at 25''C) 

MgCla 

Tween-20 

Taq Polymoase 



16mM 
67mM 
1.5 mM 
0.01% 



0.16 



Both tracer and driver are then placed into a PGR machine and subjected to: 



25 cycles 
30 seconds 



94'*C 
42'*C 



1 minute 



2 minutes 



72*'C 



Following completion of the PGR reaction both tracer and driver cDNAs are then 
purified using commercial purification systems such as GFX (Amersham Pharmacia). 

Biotinylation of Driver. 

Place 20-50 |il driver DNA (2-50 ^g) in a 1.5 ml soiew-cap tube. Boil for 2 minutes and 
place directly on ice in a small ice tray + rack. 

Add 20 ^1 2 mg/ml photobiotin to the DNA and mix well. With the lids left off place Hie 
tubes upright on ice 10 <an firom the bulb and irradiate for a total 10 minutes. After the 
first 5 minutes remove the tubes fix)m under Has light source (avoid direct irradiation), 
flick tiie tube to mix and replace imder the light source for the remaining S minutes. 

Remove the sample (avoid direct irradiation) and mix in the remaining 20 \i\ of 
photobiotin and place vinder the lig^t for an additional 5 minutes. 

Add 1/1 0th volume of IM Tris-Cl, pH 8.0 to stop the reaction. 

Purify using commercial purification systems such as GFX (Amersham Pharmacia). 
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Hybridisation of tracer plus driver and tracer enrichment: 



To a O.S ml tube add and mix in this order: 



0.5 
10 |xg 



biotinylated driver DNA 
to 20 ^1 with water then add: 
5xHyb GEH 



tracer DNA 



adjust volume 



8^1 
12 \il 



40% PEG 



Heat sample: 



5 minutes SS^'C, 
5 minutes at SO'^C 
7 minutes at 74''C 
60 minutes at eS^'C 

then hold at 68^C while seperating biotmylated molecules 



Remove biotinylated molecules using avidin boimd to a solid support. In practise tiiis can 
be carried out using commercial products as ditrected by the manufacturer such as 
Streptavidin Magnasphere™ Paramagnetic particles (S A-PMPs) provided by Promega. 

Following removal of biotinylated molecules the remaining tracer can be subjected to 
further roimds of subtraction by addition of fresh biotinylated driver DNA and repeating 
the process described above. Typically three sequential rounds of subtraction are used but 
additional roimds may be added if required. 

The final depleted product is then amplified using PGR conditions described for the 
original tracer amplification. 



SxHvb GEH 



90 mM 



EPPSpH8.5 
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10 mM EDTApH8.0 
0.5% Triton X-100 

3.75 M NaCl 



Negative Subtraction or Attraction 
1 . Preparation of tracer and driver: 
Tracer 

Approximately 0.5 ng of globally amplified cDNA added to a 20-100 pi reaction 
containing: 



250 nM dATP, dTTP, dCTP, dGTP 

1 pM Oligo NotldT (sequence 

CATCTCGAGCGGCCGC'l'l"iril"ll"lll"l"il"l"ri"l"ri"ll"l-l) 

16mM (NH4)2S04 

67mM Tris-HCl (pH 8.8 at 25°C) 

1.5 mM MgCb 

0.01% Tween-20 

0.16 u/pl Taq Polymerase 



Driver 

Approximately 0.5 of globally amplified cDNA added to a 20-100 jol reaction 
containing: 



250 nM dATP, dUTP, dCTP, dGTP 

1 pM Oligo NotldT (sequence 

CATCTCGAGCGGCCGCnrrrrmTrrrrTTTTTTTTTTT) 

16mM (NH4)2S04 

67mM Tris-HCl (pH 8.8 at 25»C) 

1.5 mM MgCb 

0.01% Tween-20 

0.16 u/pl Taq Polymerase 
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Both tracer and driver are then placed into a PGR machine and subjected to: 



25 cycles 
30 seconds 



94°C 
42°C 



1 minute 



2 minutes 



Following completion of the PGR reaction both tracer and driver cDNAs are then 
purified using commercial purification systems such as GFX (Amersham Pharmacia). 

Biotinylation of Driver 

Place 20-50 |il driver DNA (2-50 ^g) in a 1.5 ml screw-cap tube. Boil for 2 minutes and 
place directly on ice in a small ice tray + rack. 

Add 20 ^12mg/mlphotobiotinto&eDNAandmixwell. With the lids left ojff place the 
tubes upright on ice 10 cm firom the bulb and irradiate for a total 10 minutes. After the 
first 5 minutes remove the tubes from under the light source (avoid direct irradiation), 
flick the tube to mix and replace under the light source for the remaining 5 minutes. 

Remove the sample (avoid direct irradiation) and mix in the remaining 20 \i\ of 
photobiotin and place under the light for an additional 5 minutes. 

Add 1/1 0th volume of IM Tris-Gl, pH 8.0 to stop the reaction. 

Purify using conmiercial purification systems such as GFX (Amersham Pharmacia). 

2. Hybridisation of tracer plus driver and tracer emichment: 



To a 0.5 ml tube add and mix in this order: 
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0.5-10 ^lg tracer DNA 
1 0 Jig biotinylated driver DNA 1 

adjust volume to 20 |xl vdth Avater then add: 
8 nl SxHyb GEH 

\2\i\ 40% PEG 

Heat sample: 

5 minutes PS'^C, 
5 minutes at 80*^0 
7 minutes at lA^'C 
60 minutes at 68°C 

then hold at 68^C while seperating biotinylated molecules 

Remove biotinylated molecules using avidin bound to a solid support. In practise this can 
be carried out using commercial products as ditrected by the manufacturer such as 
Streptavidin Magnasphere™ Paramagnetic particles (S A-PMPs) provided by Promega. 

Release tracer DNA bound to driver DNA 1 by denaturing the driver DNA 1/tracer DNA 
hybrids. For examples using SA-PMPs the washed SA-PMPs and their attendant driver 
DNAl/tracer DNA hybrids can be heated to 96°C to release tracer DNA and bound driver 
DNA 1 removed by magnetic attraction of the SA-PMPs- 

Released tracer DNA can then be subjected to further rounds of attraction by repeating the 
process with separate drivers {driver DNAs 2, 3, 4 etc). 

The final ""attracted" product will be enriched for sequences common to all driver DNAs 
used and can be amplified using PCR conditions described for the original tracer 
amplification. 

SxHvb GEH 



90 mM 



EPPSpH8.5 



lOmM 
0.5 % 
3.75 M 



EDTApH8.0 
Triton X-100 
NaCl 
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CLAIMS 

1. A collection of labelled target DNA molecules which are exonuclease derivatives 
ofcDNA. 

2. A collection according to claim 1, wherein the cDNA is globally amplified cDNA. 

3. A collection according to claim 1 or claim 2, wherein the DNA molecules are 
labelled by incorporation of labelled nucleotides* 

4. A collection according to any preceding claim, wherein the DNA molecules are 
fluorescently labelled. 

5. A collection according to any preceding claun, wherein the labelled target DNA 
molecules are prepared from cDNA by a complexity reduction technique. 

6. A collection according to claim 5, wherein the complexity reduction technique 
comprises a restriction digestion technique. 

7. A collection according to claim 5, wherein the complexity reduction technique 
comprises a subtraction technique. 

8. A collection according to claim 5, wherein the complexity reduction technique 
comprises a cDNA display technique. 

9. A method of producing a collection of labelled target DNA molecules according 
to claim 1, comprising: 

(i) subjecting cDNA, or a derivative thereof, to exonuclease digestion to produce a 
collection of essentially single-stranded DNA molecules; and 

(ii) labelling the single-stranded molecules. 



44 



10. A method according to claim 9, wherein the single-stranded molecules are labelled 
by the action of terminal transferase in the presence of labelled nucleotides. 

11. A method of producing a collection of labelled target DNA molecules according 
to claim 1, comprising: 

(i) treating double-stranded cDNA, or a derivative thereof, to obtain a labelled 
double-stranded DNA population; and 

(ii) effecting exonuclease digestion of the labelled population to produce a collection 
of essentially single-stranded labelled DNA molecules. 

12. A method according to claim 11, wherein the labelled double-stranded DNA 
population is prepared by the action of tenninal transferase in the presence of labelled 
nucleotides. 

13. A method according to claim 11, wherein the labelled double-stranded DNA 
population is prepared by PGR in the presence of labelled nucleotides. 

14. A kit for the preparation of a collection of labelled target DNA molecules 
according to claim 1, the kit comprising: 

(i) an exonuclease; 

(ii) terminal transferase; and 

(iii) labelled nucleotides. 

15. A kit for the preparation of a collection of labelled target DNA molecules 
according to claim 1, the kit comprising: 

(i) an exonuclease; 

(ii) primers; and 

(iii) labelled nxicleotides. 

16. A kit according to claim 1 5, further comprising reagents for PGR. 
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17. A kit according to any one of claims 14 to 16, wherein the labelled nucleotides are 
fluorescently labelled 



Figure 1. 
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